Search CORE

125 research outputs found

A Novel Method for the Absolute Pose Problem with Pairwise Constraints

Author: Chen Guang
Knoll Alois
Li Xuechen
Liu Yinlong
Song Zhijian
Wang Manning
Publication venue: 'MDPI AG'
Publication date: 28/03/2019
Field of study

Absolute pose estimation is a fundamental problem in computer vision, and it is a typical parameter estimation problem, meaning that efforts to solve it will always suffer from outlier-contaminated data. Conventionally, for a fixed dimensionality d and the number of measurements N, a robust estimation problem cannot be solved faster than O(N^d). Furthermore, it is almost impossible to remove d from the exponent of the runtime of a globally optimal algorithm. However, absolute pose estimation is a geometric parameter estimation problem, and thus has special constraints. In this paper, we consider pairwise constraints and propose a globally optimal algorithm for solving the absolute pose estimation problem. The proposed algorithm has a linear complexity in the number of correspondences at a given outlier ratio. Concretely, we first decouple the rotation and the translation subproblems by utilizing the pairwise constraints, and then we solve the rotation subproblem using the branch-and-bound algorithm. Lastly, we estimate the translation based on the known rotation by using another branch-and-bound algorithm. The advantages of our method are demonstrated via thorough testing on both synthetic and real-world dataComment: 10 pages, 7figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Learnable MFCCs for Speaker Verification

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue
Publication date: 20/02/2021
Field of study

We propose a learnable mel-frequency cepstral coefficient (MFCC) frontend architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven versions of the four linear transforms of a standard MFCC extractor -- windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7\% (VoxCeleb1) and 9.7\% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort.Comment: Accepted to ISCAS 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue
Publication date: 01/01/2020
Field of study

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3\% (VoxCeleb) and 25.1\% (SITW) relative decrease in equal error rate (EER) to the baseline.Comment: Accepted to Interspeech 202

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Hydrogen sulfide induced by nitric oxide mediates ethylene-induced stomatal closure of Arabidopsis thaliana

Author: GuoHua Liu
Jing Liu
LiXia Hou
Xin Liu
XueChen Wang
Publication venue: Springer Nature
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

Learnable MFCCs for Speaker Verification

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue: HAL CCSD
Publication date: 22/05/2021
Field of study

International audienceWe propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor-windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort. Index Terms-Speaker verification, feature extraction, melfrequency cesptral coefficients (MFCCs)

INRIA a CCSD electronic archive server

Learnable Nonlinear Compression for Robust Speaker Verification

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/02/2022
Field of study

International audienceIn this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and Vox-Movies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue: 'International Speech Communication Association'
Publication date: 28/06/2022
Field of study

International audienceIn this paper, we initiate the concern of enhancing the spoofing robustness of the automatic speaker verification (ASV) system, without the primary presence of a separate countermeasure module. We start from the standard ASV framework of the ASVspoof 2019 baseline and approach the problem from the back-end classifier based on probabilistic linear discriminant analysis. We employ three unsupervised domain adaptation techniques to optimize the back-end using the audio data in the training partition of the ASVspoof 2019 dataset. We demonstrate notable improvements on both logical and physical access scenarios, especially on the latter where the system is attacked by replayed audios, with a maximum of 36.1% and 5.3% relative improvement on bonafide and spoofed cases, respectively. We perform additional studies such as per-attack breakdown analysis, data composition, and integration with a countermeasure system at score-level with Gaussian back-end

INRIA a CCSD electronic archive server

Hal-Diderot

Optimized Power Normalized Cepstral Coefficients Towards Robust Deep Speaker Verification

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue: HAL CCSD
Publication date: 13/12/2021
Field of study

International audienceAfter their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification. However, as a feature extractor with long-term operations on the power spectrogram, its temporal processing and amplitude scaling steps dedicated on environmental compensation may be redundant. Further, they might suppress intrinsic speaker variations that are useful for speaker verification based on deep neural networks (DNN). Therefore, in this study, we revisit and optimize PNCCs by ablating its mediumtime processor and by introducing channel energy normalization. Experimental results with a DNN-based speaker verification system indicate substantial improvement over baseline PNCCs on both in-domain and cross-domain scenarios, reflected by relatively 5.8% and 61.2% maximum lower equal error rate on VoxCeleb1 and VoxMovies, respectively

INRIA a CCSD electronic archive server